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ABSTRACT 

RxL  triply  balanced  matrices  arise  in  cross  validation 
studies  and  in  estimating  the  mean  square  errors  of  nonlinear 
statistics  in  many  large  scale  survey  samplings.  It  is 
shown  that:  (1)  Any  RxL  triply  balanced  matrix  and  an 
orthogonal  array  OA(R,L,2,3;  X)  are  one  and  the  same 
object  up  to  a  possible  notational  change  of  the  two  symbols 
of  the  array  to  +  and  -  respectively,  (2)  R  is  a 
multiple  of  8  and  L  s  R/2,  and  (3)  The  problem  of  the 
construction  of  RxL  triply  balanced  matrices,  3sLsR/2, 
is  completely  resolved  modulo  the  existence  of  Hadamard 
matrices  of  order  R/2. 
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A  COMPLETE  CHARACTERIZATION  OF  TRIPLE 
BALANCED  MATRICES  WITH  APPLICATIONS 
TO  SURVEE  SAMPLING: 

BE  A.  HEDAYAT  AND  H.  PESOTAN 

UNIVERSITY  OF  ILLINOIS  AT  CHICAGO  AND  UNIVERSITY  OF  GUELPH 

1.  Introduction.  The  technique  of  subsampling  has  been 
shown  to  be  a  very  powerful  method  in  cross-validation 
studies  and  in  estimating  the  mean  square  errors  of 
nonlinear  statistics  such  as  ratios,  correlation  coefficients 
and  regression  coefficients.  The  contributions  of  McCarthy 
(1966,1969,1976),  Gurney  and  Jewett  (1975)»  Lemeshow  and 
Levy  (1978),  Krewski  and  Rao  (1981)  and  Rao  and  Wu  (1983) 
on  this  topic  are  noteworthy.  Additional  contributions  in 
this  area  may  be  found  in  the  references  of  the  papers  just 
mentioned . 

The  problem  that  we  shall  deal  with  here  falls  in  the 
area  of  subsampling  known  as  balanced  half  sample  replication 
(BHSR)  introduced  and  studied  by  McCarthy  (1966,1969)  for 
stratified  samples.  The  set  up  is  briefly  as  follows: 

The  sampled  population  consists  of  L  strata  and  we  have 
a  random  sample  of  two  primary  sampling  units  (PSU)  from  each 
stratum.  Thus,  let  y^i»  Y^2  two  observations 

related  to  the  two  PSU  in  stratum  h,  h  =  1,2,...,L. 

Identify  yhl  with  +  and  yh2  with  -.  Prepare  an  RxL 


2 


matrix  A  =  (&rh)  with  entries  +1  or  -1  such  that  in 
each  column  of  a  there  are  as  many  +1  as  -1,  and  in 
addition  any  two  columns  of  A  are  orthogonal.  Now  identify 
the  L  columns  of  A  with  L  strata.  Form  R  half 
subsamples  of  size  L  determined  by  the  R  rows  of  A. 

Thus  the  half  subsample  based  on  the  i-th  row  of  A  is 
obtained  as  follows.  If  the  (l,j)  -  th  entry  of  A  is  +1 
take  y^  in  the  i-th  subsample,  otherwise  y^2  will  be 
in  the  subsample.  Let  0  be  the  parameter  of  interest  which 

A 

is  to  be  estimated  based  on  the  data.  Let  0^  be  an 
estimator  of  0  based  on  the  i-th  half  subsample  and  0 
an  estimator  of  0  based  on  the  entire  2L  data  points. 

A  H  ,  ^2 

Then,  v(3)  =  c  I  (9.  -0)  is  suggested  as  an  estimator 
i=l  / 

of  the  variance  of  0  for  a  properly  chosen  constant  c. 
Several  authors  have  demonstrated  that  there  are  cases  for 
which  v(9)  behaves  nicely  and  can  be  utilized  in  practice. 

A 

Indeed,  if  0  is  a  nonlinear  statistic  in  the  data,  then 
perhaps  one  has  no  choice  but  to  use  v(0)  or  some  other 
similar  statistic  based  on  jackknifing  or  bootstraping  the 
data. 

Rao  and  Wu  (1983)  have  done  a  serious  analytical  study 
of  the  above  mentioned  technique  for  a  general  nonlinear 

A 

statistic  0  and  made  the  following  interesting  discovery 
in  the  context  of  BHSR.  They  proved  that  if  the  matrix  A 
has  the  additional  feature  that  for  any  3  of  its  columns 


3. 


h,  s  and  t,  S  Rrh  Brs  6rt  =  0,  Ms4t;  h,s,t-l,2 . L, 

^  .  A 

then  the  statistic  v(QJ  enjoys  additional  statistical 
regularities.  In  the  summer  of  1984  J.N.K.  Rao  presented 
this  result,  obtained  jointly  with  Jeff  Wu,  in  the  workshop 
on  Efficient  Data  Collection  held  at  the  University  of 
California  at  Berkeley.  He  proposed  the  existence  and  the 
construction  of  such  matrices  a  as  an  open  problem.  In 
this  paper  we  have  solved  this  problem.  Of  course  as  we 
shall  see  this  additional  demand  on  the  matrix  A  requires 
that  more  subsamples  need  to  be  taken  than  otherwise.  This 
means  that  the  practioner  has  to  balance  the  need  for  more 
statistical  regularity  versus  the  cost  for  more  computation. 


2.  Preliminaries .  An  RxL  matrix  A  =  (6  h)  where 
6  ,  =  +1,  or  -1  will  be  called  triply  balanced  if  and  only 


(1)  S  8  -  0,  h  =  1,2 . L, 

r 


(ii)  Z  6rh6rs  =  0,  h  4  s;  h,s  -  1,2, ...,L, 


(iii)  S  6rh  6rg  6rt  =  0,  h  4  s  4  tj  h,s,t  -  1,2,...,L. 


Observe  that  (i)  and  (ii)  imply  respectively  that  each 
column  of  A  is  orthogonal  to  the  vector  all  of  whose 
entries  are  +1,  and  that  any  two  columns  of  A  are 
orthogonal.  Condition  (iii)  carries  no  usual  orthogonality 


-V  J*.  •• 
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implication  but  is  considered  here  since  it  evolved  in  the 
context  of  survey  sampling  [Rao  and  Wu  (1^83)].  The 
choice  of  the  term  balanced  in  the  above  definition  will  be 
justified  in  the  next  section. 

In  writing  out  matrices,  +  and  -  are  used  as  abbreviations 
for  +1  and  -1.  The  existence  and  construction  of  triply 
balanced  matrices  will  be  taken  up  in  Section  3.  Below  is 
an  example  of  an  8x4  triply  balanced  matrix: 


is  useful  to  discover  for  which  values  of  R  and  L 
triply  balanced  matrices  exist  and  how  they  may  be  constructed. 
We  shall  see  that  the  three  conditions  of  the  definition 
place  some  restrictions  on  R  and  L  so  that  such  matrices 
cannot  be  easily  constructed. 

Since  L  a  3,  condition  (ii)  implies  that  R  2  O(mod  4). 
We  shall  show  that  condition  (iii)  puts  further  restrictions 
on  R.  Indeed,  it  will  turn  out  that  such  matrices 
exist  only  if  R  n  O(mod  8)  and  3  s  L  s  R/2.  To  establish 
this  and  its  consequences  we  begin  with  some  notation  and 


a  lemma. 


Let  f^g(i,j)  be  the  frequency  with  which  the  pair 
(i,j)  occurs  in  any  two  distinct  columns  h  and  s  of  A 
where  i,j  €  {+,-}.  Similarly,  let  ^hst^’^’1^  be  the 
frequency  with  vhich  the  triple  (i,j,k)  occurs  in  any 
three  distinct  columns  h,s,t  of  A,  where  i,j,  k  6  (+,-} 


LEMMA  3.1.  If  A  is  an  RxL  triply  balanced  matrix 


( 1 )  fhs ( i, j )  =  R/4,  for  all  choices  of  h  4  s ; 

(2)  fhst(i,j,k)  =  R/8,  for  all  choices  of  h4s4t,  and 
consequently  R  s  O(mod  8). 

PROOF.  (1).  By  condition  (i),  each  column  of  A  has 
R/2  plus  ones  and  R/2  minus  ones.  Hence 

fhs(+’+^  +  fhs(+,"J  =  fhs<+’+)  +  fhs(’’+>  =  R/2‘  ^ 

condition  (iij,  fhg(+,+)  -  fhs(+»")  “  fhs(-.+)  +  fhs^_,_J  = 

From  these  equations  (1)  follows. 

(2).  To  simplify  the  notation  let 


W+-+-+>  = a 

fv,„4.(+»  +  » ")  =  b 


fhst("’+»+>  =  e 
f^.(-,+,-)  =  f 


ii' 


By  condition  (i),  a+b+c  +  d=R/2.  By  (1)  and  condition  (ii) 
applied  in  turn  to  columns  h  and  s,  h  and  t,  and  s  and 
t  respectively  we  have 

a  +  b  =  R/4  a  +  c  =  R/4  a  +  e  =  R/4 

c  +  d  =  R/4  b  +  d  =  R/4  b  +  f  =  R/4 

e  +  f  =  R/4  e  +  g  =  R/4  c  +  g  =  R/4 

g  +  m  =  R/4  f  +  m  =  R/4  d  +  m  =  R/4. 

From  the  above  equations  it  follows  that  b  =  c  =  e  =  m 
and  f  =  g  =  d.  Now  by  condition  (iii), 

a  •  b  *  b  +  R/2  -  ( a+b+c )  ~  b  +  R/2  ~  (a+b+c)  +  R/2  —  (a+b+c)  ~  b  = 

which  simplifies  to  a  +  5b  =  3R/4.  Since  a  +  b  =  R/4 
we  conclude  that  a=b=c=d=f=g=m  and  (2)  follows. 
Since  fhst(i,j,k)  is  an  integer,  therefore  R  a  O(mod  8), 
and  this  completes  the  proof. 

Note  that  in  each  column  of  a  triply  balanced  matrix  A 
the  number  of  plus  ones  is  the  same  as  the  number  of  minus 
ones  as  required  by  condition  (i).  In  every  pair  of 
columns  of  A,  each  type  of  pair  (+,+),  (+,-),  (-,+)  and 
occurs  equally  frequently  by  part  (1)  of  Lemma  3.1. 

In  every  three  columns  of  A,  each  type  of  triple  (+,+,+), 

(+»+»“)»  (+»“»+)»  (+#"»“)»  (“>+»+)>  (“»+»“)»  (“>“>+) 
and  occurs  equally  frequently  by  part  (2)  of 

Lemma  3.1.  This  observation  justifies  the  term  balanced 
used  for  such  matrices.  More  importantly  we  can  conclude 


the  following  theorem 


THEOREM  3.1.  Any  R  x  L  triply  balanced  matrix  b  is_ 
an  orthogonal  array  0A(R,L,2,3 ; x )  in  R  runs .  L 
constraints.  2  symbols ,  strength  3  and  ind ex  X  =  R/8. 
Conversely,  any  OA(R,L,2,3 ;R/8)  is  a  R  x  L  triply 
balanced  matrix,  subject  to  a  possible  notational  change 
of  the  array  to  +  and 

Fortunately,  the  existence  and  construction  of  orthogor 
arrays  0A(R,L,2,3;X)  has  been  studied  extensively  in 
the  literature.  Hadamard  matrices  have  been  used  to 
construct  such  arrays.  For  example,  if  H  is  a  Hadamard 
matrix  of  order  4t,  then  it  is  well  known  that 

*  -  [-S] 

is  an  OA ( 8t , 4t , 2 , 3  5 1 )  or  as  we  established  here  a 
triply  balanced  8t  x  4t  matrix.  For  a  summary  article 
on  Hadamard  matrices  see  Hedayat  and  Wallis  (1978), 

Indeed,  we  can  establish  a  stronger  relationship  between 
triply  balanced  matrices  and  Hadamard  matrices  as  indicated 
in  the  following: 

PROPOSITION  3.1.  A  triply  balanced  R  x  R/2  matrix  b 
exists  if  and  only  if  a  Hadamard  matrix  of  order  R/2 
exists. 


PROOF.  The  sufficiency  is  indicated  above.  To  prove  the 
necessity,  without  loss  of  generality  we  can  put  b  in 
the  form 


where  1  is  a  R/2  x  1  column  vector  of  plus  ones.  Any 
two  columns  of  j^j  are  orthogonal  and  hence  taken  together 
with  the  first  column  of  A  implies  that  any  two  columns 
of  C  as  well  as  of  D  are  orthogonal.  From  this  the 
result  follows. 

It  is  worth  noting  that  from  Margolin  (1969)  it 
follows  that  D  =  -C,  that  is,  in  the  terminology  of  design 
of  experiments  [  —1 1 D ]  is  the  fold -over  of  [3jC], 

4.  Discussion.  In  the  context  of  sampling,  for  a 
given  value  of  L  we  are  interested  in  finding  a  R  x  L 
triply  balanced  matrix  with  R  minimum.  From  Bose  and 
Eush  (1952;  we  know  that  in  an  0A(R,L,2,3;X)  with 
R  ^  O(mod  8),  we  must  have  that  R  a  2L.  Indeed,  the 
lower  bound  on  R  is  achieved  via  the  construction  based 
on  Hadamard  matrices  mentioned  above. 

Let  us  summarize  our  findings.  R  x  L  triply  balanced 
matrices  arise  in  cross-validation  studies  and  in  estimating 
the  mean  square  errors  of  nonlinear  statistics  in  large 
scale  survey  sampling.  Consequently,  there  is  a  need  to 


investigate  when  such  matrices  can  be  constructed  so  that 
practioners  in  survey  sampling  can  utilize  them  in  their 
work.  We  have  shown  here  that: 

1.  Any  R  x  L  triply  balanced  matrix  and  an  orthogonal 
array  OA(R,L,2,3;  R/B )  are  one  and  the  same  object  up 
to  a  possible  notational  change  of  the  two  symbols  of  the 
array  to  +  and  -  respectively. 

2.  R  is  a  multiple  of  8  and  L  £  R/2. 

3.  The  problem  of  the  construction  of  RxL  triply 
balanced  matrices,  3  £  L  £  R/2,  is  completely  resolved 
modulo  the  existence  of  Hadamard  matrices  of  order  R/2. 

In  closing  we  might  point  out  that  in  a  similar  fashion 
as  in  the  argument  given  in  Section  3,  it  can  be  shown  that 
an  orthogonal  array  0A(R,L,2,t;  R/2^)  with  t  s  4,  is 
a  ^  Ply  balanced  matrix  and  conversely.  Here  a  t-ply 
balanced  RxL  matrix  for  t  a  4  would  have  the  obvious 
extension  of  the  definition  given  here  for  the  case  t  =  3, 
that  is  the  case  of  triply  balanced  matrices.  Whether  or 
not  such  t-ply  balanced  matrices  with  t  a  4  might  be  of 
use  in  survey  sampling  is  yet  to  be  seen. 
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